I have two favorite baseball teams the Arizona Diamondbacks and Detroit Tigers. I wanted to know which one is the better team. So I took each teams current lineup (9 batters) and calculate the probability for each of the following outcomes: Out, Single, Double, Triple, and Homerun.
library(tidyverse)
library(plotly)
# Diamondback Player Data
Dimondbacks <- read.csv("Diamondbacks.csv")
# Creating a list
Arizona <- vector(mode = "list", length = 9)
# Adding players data to list
Arizona[[1]] <- Dimondbacks %>%
filter(Player == 1) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[2]] <- Dimondbacks %>%
filter(Player == 2) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[3]] <- Dimondbacks %>%
filter(Player == 3) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[4]] <- Dimondbacks %>%
filter(Player == 4) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[5]] <- Dimondbacks %>%
filter(Player == 5) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[6]] <- Dimondbacks %>%
filter(Player == 6) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[7]] <- Dimondbacks %>%
filter(Player == 7) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[8]] <- Dimondbacks %>%
filter(Player == 8) %>%
select(Outs, Single, Double, Triple, Homerun)
Arizona[[9]] <- Dimondbacks %>%
filter(Player == 9) %>%
select(Outs, Single, Double, Triple, Homerun)
# Reading Tiger Data in
Tigers <- read.csv("Tigers.csv")
#Creating list
Detroit <- vector(mode = "list", length = 9)
# Adding players data to the list
Detroit[[1]] <- Tigers %>%
filter(Player == 1) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[2]] <- Tigers %>%
filter(Player == 2) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[3]] <- Tigers %>%
filter(Player == 3) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[4]] <- Tigers %>%
filter(Player == 4) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[5]] <- Tigers %>%
filter(Player == 5) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[6]] <- Tigers %>%
filter(Player == 6) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[7]] <- Tigers %>%
filter(Player == 7) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[8]] <- Tigers %>%
filter(Player == 8) %>%
select(Outs, Single, Double, Triple, Homerun)
Detroit[[9]] <- Tigers %>%
filter(Player == 9) %>%
select(Outs, Single, Double, Triple, Homerun)
In order to figure out the possible outcomes for each specific player, Brother Rose and I created a function that calls for a probability vector and a random number from 0 to 1. Now if the random number is less than the probability that the player calculates an out, the function will return a 0 (The number of bases earned). If the random number is less than the sum of the probabilities of committing an out and a single will return a 1. It will continue on adding probabilities up to the probability of hitting a triple, then if the random number is greater than that sum, the function would return a 4.
atbatresult <- function(statvec_, rU01) {
if (rU01 < statvec_$Outs){
return(0)
} else if( rU01 < statvec_$Outs + statvec_$Single) {
return(1)
} else if(rU01 < statvec_$Outs + statvec_$Single + statvec_$Double){
return(2)
} else if (rU01 < statvec_$Outs + statvec_$Single + statvec_$Double + statvec_$Triple){
return(3)
} else {
return(4)
}}
In order for me to simulate a full game, I needed to break it up into smaller parts and add upon each part. So I started with trying to simulate one inning for the Arizona Diamondbacks. The number returned is the total runs obtianed in one inning for the Diamondbacks.
Out <- 0
Totalbases <- 0
Totalbatters <- 0
for(x in 1:9) { # Goes through the team's lineup
while(Out < 3){ # Number of outs allowed in one inning
outcome <- atbatresult(Arizona[[x]], runif(1)) # Obtain the outcome for each batter
if(outcome == 0 ){
Out <- Out + 1 # Increase the number of outs that were obtained in the Inning
} else{
Totalbases <- Totalbases + outcome # Number of bases obtained in the inning
}
Totalbatters <- Totalbatters + 1 # Number of batters that appeared at the plate
if(x < 9){
x <- x + 1
} else if (x == 9) { # Continues through the lineup once it gets to the last person.
x <- 1
}
}
ArizonaRuns <- floor(Totalbases/4) # The number of runs scored in the inning
return(ArizonaRuns)
}
ArizonaRuns
## [1] 1
Now in order to simulate one complete game, I need to simulate 9 innings. So I took the previous code and included it inside of a for loop that makes the simulation go for 9 innings. This also returns a box score for the game for both teams being used in the simulation.
# Arizona Diamondbacks
Out <- 0
Totalbases <- 0
Totalbatters <- 0
AZ_Dbacks <- vector(mode = "list", length = 9)
Inning <- 1
ArizonaTotal <- 0
for(Inning in 1:1){ # Goes for one complete game
for(x in 1:9) {
while(Out < 3){
outcome <- atbatresult(Arizona[[x]], runif(1))
if(outcome == 0 ){
Out <- Out + 1
} else{
Totalbases <- Totalbases + outcome
}
Totalbatters <- Totalbatters + 1
if(x < 9){
x <- x + 1
} else if (x == 9) {
x <- 1
}
}
ArizonaRuns <- floor(Totalbases/4)
ArizonaTotal <- ArizonaTotal + ArizonaRuns
if(Out == 3){
AZ_Dbacks[[Inning]] <- ArizonaRuns
Inning <- Inning + 1
Out <- 0
}
Totalbases <- 0
ArizonaRuns <- 0
}}
# Detroit Tigers
Out <- 0
Totalbases <- 0
Totalbatters <- 0
MI_Tigers <- vector(mode = "list", length = 9)
Inning <- 1
DetroitTotal <- 0
for(Inning in 1:1){
for(x in 1:9) {
while(Out < 3){
outcome <- atbatresult(Detroit[[x]], runif(1))
if(outcome == 0 ){
Out <- Out + 1
} else{
Totalbases <- Totalbases + outcome
}
Totalbatters <- Totalbatters + 1
if(x < 9){
x <- x + 1
} else if (x == 9) {
x <- 1
}
}
DetroitRuns <- floor(Totalbases/4)
DetroitTotal <- DetroitTotal + DetroitRuns
if(Out == 3){
MI_Tigers[[Inning]] <- DetroitRuns
Inning <- Inning + 1
Out <- 0
}
DetroitRuns <- 0
Totalbases <- 0
}}
BoxScore <- rbind(AZ_Dbacks, MI_Tigers)
BoxScore
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## AZ_Dbacks 1 0 0 0 1 1 1 0 0
## MI_Tigers 0 1 0 1 0 1 0 0 0
Now I wanted to simulate not just one game but 10,000 games. I made a for loop that would go through the previous code 10,000 times and add the totals for both teams into a data frame. Here is a sample of what the data frame looks like for 10 games.
Games <- 10^4
TotalRunsPerGame <- data.frame(ncol = 0, nrow = 0)
colnames(TotalRunsPerGame) <- c("Dbacks", "Tigers")
for(z in 1:Games){
# Arizona Diamondbacks
Out <- 0
Totalbases <- 0
Totalbatters <- 0
BoxScore_1 <- vector(mode = "list", length = 9)
Inning <- 1
ArizonaTotal <- 0
for(Inning in 1:1){ # Goes for one complete game
for(x in 1:9) {
while(Out < 3){
outcome <- atbatresult(Arizona[[x]], runif(1))
if(outcome == 0 ){
Out <- Out + 1
} else{
Totalbases <- Totalbases + outcome
}
Totalbatters <- Totalbatters + 1
if(x < 9){
x <- x + 1
} else if (x == 9) {
x <- 1
}
}
ArizonaRuns <- floor(Totalbases/4)
ArizonaTotal <- ArizonaTotal + ArizonaRuns
if(Out == 3){
BoxScore_1[[Inning]] <- ArizonaRuns
Inning <- Inning + 1
Out <- 0
}
Totalbases <- 0
ArizonaRuns <- 0
}}
# Detroit Tigers
Out <- 0
Totalbases <- 0
Totalbatters <- 0
BoxScore_2 <- vector(mode = "list", length = 9)
Inning <- 1
DetroitTotal <- 0
for(Inning in 1:1){
for(x in 1:9) {
while(Out < 3){
outcome <- atbatresult(Detroit[[x]], runif(1))
if(outcome == 0 ){
Out <- Out + 1
} else{
Totalbases <- Totalbases + outcome
}
Totalbatters <- Totalbatters + 1
if(x < 9){
x <- x + 1
} else if (x == 9) {
x <- 1
}
}
DetroitRuns <- floor(Totalbases/4)
DetroitTotal <- DetroitTotal + DetroitRuns
if(Out == 3){
BoxScore_2[[Inning]] <- DetroitRuns
Inning <- Inning + 1
Out <- 0
}
DetroitRuns <- 0
Totalbases <- 0
}}
TotalRunsPerGame[z,] <- c(ArizonaTotal,DetroitTotal)
}
TotalRunsPerGame[1:10,]
## Dbacks Tigers
## 1 2 3
## 2 7 1
## 3 1 1
## 4 5 2
## 5 6 1
## 6 3 2
## 7 5 1
## 8 7 5
## 9 5 0
## 10 4 1
This first graphic shows the total scores for 100 games between the Diamondbacks and Tigers, and the second graphic shows the total scores for all of the games. The red line represents the game ended in a tie. The dots above the red line indicate that the tigers won, and the dots below the line indicate the Diamondbacks won. This is interesting to look at because it shows the spread of how each team scores runs.
TotalRunsPerGame <- TotalRunsPerGame %>%
mutate(Winner = ifelse(Dbacks > Tigers, "Dbacks",ifelse(Tigers > Dbacks, "Tigers", "Tie")))
TotalRunsPerGame2 <- TotalRunsPerGame %>%
slice(1:100)
ggplot(TotalRunsPerGame2, aes(x = Dbacks, y = Tigers))+
geom_point(aes(col = Winner, size = 2))+
geom_abline(intercept = 0, slope = 1, col = "firebrick")+
labs(title = "Scores For Each Game Simulated (100)", x = "Dbacks Total Runs", y = "Tigers Total Runs")
g <- ggplot(TotalRunsPerGame, aes(x = Dbacks, y = Tigers))+
geom_point(aes(col = Winner))+
geom_abline(intercept = 0, slope = 1, col = "firebrick")+
labs(title = "Scores For Each Game Simulated (10,000)", x = "Dbacks Total Runs", y = "Tigers Total Runs")
ggplotly(g)
This graphic shows how many times a specific team won or how many times the two team tied. From the graphic we can see that the Arizona Diamondbacks win more times in this match up.
h <- ggplot(TotalRunsPerGame, aes(x = Winner))+
geom_histogram(stat = "count", aes(fill = Winner), col = "black")+
labs(title = "Number of Wins per Team", y = "Number of Wins")
ggplotly(h)
In a game of baseball there are no ties. The games would continue into extra innings until one team has more runs than the other. For simplicity this model didn’t take that into account and said they would just end up in a tie. Also the opposing teams defense can commit an error and I didn’t take that into account. To include this you would just need to calculate the probability of errors for each team and incorporate an if statement saying if a random number is less than the probability of committing an error then the runner is safe, but if the random number is greater than the probability of committing an error than the runner would be out. There are also a ton of other baseball rules that could be incorporated in this model.